A Programmer's Summary of 80386 CPU Enhancements Daniel A. Norton CHERRY HILL SOFTWARE September, 1989 Copyright 1989, Daniel A. Norton. All Rights Reserved. Permission is hereby granted for any individual or corporation to copy this publication provided that it is copied in whole and not in part, and provided that no charge is placed on its duplication beyond cost of labor and materials. This whole document includes the 16 pages from this cover page to page 16. 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved Intel is a trademark of Intel Corporation. Microsoft is a registered trademark of Microsoft Corporation First edition, September 10, 1989 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved INTRODUCTION This summary describes the differences between the instructions of the Intel 80286 and 80386 processors. It is primarily intended for programmers who are already familiar with the 80286 instruction set. A more detailed description of the 80386 processor is presented in the 80386 Programmer's Reference Manual, order number 230985-001, published by Intel Literature, 800/548- 4725. Introduction Page 3 of 16 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved REGISTER EXTENSIONS The 80386 CPU extends the 80286 register set by expanding all of the 16-bit registers to 32 bits, with the exception of the segment registers. Two additional segment registers have been added, FS and GS. As an example of the extension of the registers, EAX is a 32-bit register whose lower 16 bits are the familiar AX register. Although the upper and lower bytes of AX can be accessed independently through AH and AL, there is no support for accessing the upper 16 bits of EAX independently from the lower 16 bits. The other 32-bit registers are named as their corresponding 16-bit registers, but with the letter "E" prefixed: EBX, ECX, EDX, ESI, EDI, EBP, ESP. Figure 1, "32-bit EAX Register Layout," illustrates the relationship between EAX, AX, AH, and AL. 3 1 1 6 8 +---------------------------+------------+-------------+ | | | | +---------------------------+------------+-------------+ <------------------------ EAX -----------------------> <---------- AX ----------> <--- AH ---> <--- AL ---> Figure 1 32-bit EAX Register Layout Unlike the other segment registers, the new segment registers, FS and GS, have no default selection with any memory access instructions. When referencing with FS or GS, a segment override must always be specified. Register Extensions Page 4 of 16 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved 32-BIT OPERANDS AND ADDRESSES Intel added the 32-bit memory and register references without increasing the number of instruction op-codes to do so. Instead, the CPU has an 80286-compatible 16-bit operation mode, and a new 32-bit operation mode. The default is 16-bit mode, but this can be overridden in one of two ways: 1) By prefixing the instruction with an operand- or address-size override prefix, or, in protect-mode, by specifying a 32-bit code segment. The operand size may be 16 or 32 bits. An instruction that normally refers to AX will refer to EAX if the 32-bit operand size is specified. Similarly, the address size is normally 16-bits, but a 32-bit address is assumed if a 32- bit address size is specified. When running in protect mode, one of the attributes of the code segment descriptor indicates whether or not the processor is in 16-bit or 32-bit mode. When operating in real-mode, addresses and operands always default to 16 bits. The current default modes can be overridden by an override prefix on the instruction. The override only affects the instruction to which it is prefixed. The CPU reverts to the default mode on the following instruction (unless it is overridden again). The hexadecimal value for the operand override is 66h. The address override value is 67h. These overrides may be used in combination to override both the address size and operand size of the instruction that follows the operand. For example, the instruction "MOV AX,BX" in real-mode or in a 16-bit protect-mode segment is coded in hexadecimal as "8B C3." In a 32-bit protect-mode segment, the same code, "8B C3," would represent "MOV EAX,EBX". To generate "MOV EAX,EBX" from real-mode or from a 32-bit protect-mode code segment, the opcode contains the operand size prefix and is coded as "66 8B C3." 32-bit Operands and Addresses Page 5 of 16 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved EFFECTIVE ADDRESSES With the 80286, a memory operand could be referenced by its direct address, with an optional base using BX and an optional index using SI or DI. When referencing a 32-bit address, the 80386 removes the restriction of specialized registers and allows any 32-bit non-segment register to be used as a base and any 32-bit register except ESP to be used as an index. Furthermore, the value of the index register can be multiplied by 2, 4, or 8 before adding into the effective address. By allowing the index to be scaled in this way, references to arrays whose entries are each 2, 4, or 8 bytes can be more quickly accessed. With the 80286, such references must be preceeded with an instruction which multiplies the index to obtain the correct byte-offset. Although this new feature can be used to reference 16-bit and real-mode segments, the programmer must insure that the upper 16-bits of the effective address are zero, otherwise a General Protection fault will occur (even in real mode). This example of the LEA instruction uses all addresing modes: LEA EAX,[ECX+4*EDX+5] This example also illustrates the new power of the LEA instruction as provided by the 80386 processor to calculate certain first-degree polynomial expressions, placing the result in a register that is not part of the expression. Effective Addresses Page 6 of 16 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved PAGING In addition to the protect-mode segmentation features available with the 80286, the 80386 CPU adds paging features. Paging can only be enabled in conjuction with segmenting. In other words, it can only be enabled when the processor is in protected mode. When paging is disabled, the address calculated by segmentation calculations is a physical memory address, and references memory directly. When paging is enabled, however, the address calculated by the segmentation logic is referred to as a linear address (sometimes referred to as a "logical address"). The linear address is the input to the paging logic, which converts the linear address into a physical address. One advantage of this extra address translation is to allow more than one task to occupy the same "logical space." For example, programs that run in virtual 8086 mode expect to see addresses from 0x00000 to 0xFFFFF. Normally, only one task could occupy this range. With paging enabled, however, each task will see the same logical space, but the paging logic will convert these addresses to different physical addresses. Another advantage of this address translation is to allow what is called demand-paging. A task's logical space can be very large -- larger than the available physical memory. Only some of the logical space will be present in physical memory. The rest could be stored on disk. The paging logic can be programmed so that pages that are present are mapped to their corresponding physical memory addresses. Pages that are not present are marked as such by the paging logic. When present pages are referenced by the task, reads and writes exchange from memory in the normal way. If a task references a page that is not present, however, the memory reference traps the processor so that the operating system can load the page from disk. This demand-paging is similar to segment swapping, but is different in that the size of all pages are fixed (4096 bytes). This fixed size not only simplifies the logic required, but allows for not-present pages within a larger object. For example, if an application requires a large object, say 64k bytes, the segment swapping method would require that all 64k of the object be present in memory when any part of the object is loaded. With paging, only the page that is referenced needs to be present. Of course, one could create segments with lengths limited to 4k bytes, but Paging Page 7 of 16 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved managing a single object over several segments can be extremely clumsy. Paging Page 8 of 16 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved DEBUG REGISTERS The 80386 CPU expands upon the debugging capabilities of "INT 3" and the trap flag, by allowing traps on specific types of memory references. This allows the CPU to run normally until one of the specified breakpoint conditions is met. The debug registers allow up to a total of four breakpoints to be active at any one time. A breakpoint is specified by loading the debug registers with four parameters per breakpoint: 1) Reference Type: - Instruction Execution - Data Write - Data Read or Write 2) Reference Address: - 32-bit linear address 3) Reference Length: - 1, 2 or 4 bytes Note that in protect-mode with paging enabled, the breakpoint address checking occurs before paging translation. With paging disabled and in real mode, the breakpoint address is a physical 32-bit address. The reference length for instruction execution breakpoints is always one (1), regardless of the number of bytes in the instruction. The reference address refers to the first byte of the instruction, including prefixes, if any. If the reference length for data breakpoints is 2 or 4, the reference address must lie on a 2-byte or 4-byte boundary, respectively. Any data reference within the specified range causes the trap. Debug Registers Page 9 of 16 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved INSTRUCTION SUMMARY BSF Bit Scan Forward The BSF instruction searches the specified 16- or 32-bit target for a "1" bit, starting at the most low-ordered bit, and places the bit index in the specified destination register. The most low-ordered bit has an index of zero (0); the most high-ordered bit has an index of 15 (for 16- bit targets) or 31 (for 32-bit targets). In other words, this instruction searches from low to high and counts the number of zero bits before the first non-zero bit. The search begins at the most low-ordered bit, and proceeds up to the most high-ordered bit. If all of the bits in the target are zero, the zero flag is cleared; otherwise, the zero flag is set and the destination register contains the number of zero bits encountered. EXAMPLES: BSF BX,usWord ; Find the first "1" bit in "usWord" BSF EAX,EBX ; Find the first "1" bit in EBX Instruction Summary Page 10 of 16 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved BSR Bit Scan Reverse The BSR instruction searches the specified 16- or 32-bit target for a "1" bit, starting at the most high-ordered bit, and places the bit index in the specified destination register. The most low-ordered bit has an index of zero (0); the most high-ordered bit has an index of 15 (for 16- bit targets) or 31 (for 32-bit targets). In other words, this instruction searches from high to low and counts the number of zero bits before the first non-zero bit. The search begins at the most high-ordered bit, and proceeds down to the most low-ordered bit. If all of the bits in the target are zero, the zero flag is cleared; otherwise, the zero flag is set and the destination register contains the number of zero bits encountered. EXAMPLES: BSR BX,usWord ; Find the first "1" bit in "usWord" BSR EAX,EBX ; Find the first "1" bit in EBX BT Test Bit The BT instruction copies the specified bit into the carry flag. The bit is specified with two operands. The bit offset is truncated to the number of bits in the specified operand. EXAMPLES: BT AX,5 ; Test the 0020h bit BT usWord,BX ; Test the 2^(BX MOD 16) bit BT EAX,17 ; Test the 00020000h bit BT ulDWord,EAX ; Test the 2^(EAX MOD 32) bit BT AX,17 ; Test the 0002h bit Instruction Summary Page 11 of 16 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved BTC Complement Bit The BTC instruction complements the specified bit. The bit is specified with two operands. The bit offset is truncated to the number of bits in the specified operand. EXAMPLES: BTC AX,5 ; Complement the 0020h bit BTC usWord,BX ; Complement the 2^(BX MOD 16) bit BTC EAX,17 ; Complement the 00020000h bit BTC ulDWord,EAX ; Complement the 2^(EAX MOD 32) bit BTC AX,17 ; Complement the 0002h bit BTS Set Bit The BTS instruction sets the specified bit to 1. The bit is specified with two operands. The bit offset is truncated to the number of bits in the specified operand. EXAMPLES: BTS AX,5 ; Set the 0020h bit BTS usWord,BX ; Set the 2^(BX MOD 16) bit BTS EAX,17 ; Set the 00020000h bit BTS ulDWord,EAX ; Set the 2^(EAX MOD 32) bit BTS AX,17 ; Set the 0002h bit Instruction Summary Page 12 of 16 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved BTR Reset Bit The BT instruction resets the specified bit to 0. The bit is specified with two operands. The bit offset is truncated to the number of bits in the specified operand. EXAMPLES: BTR AX,5 ; Reset the 0020h bit BTR usWord,BX ; Reset the 2^(BX MOD 16) bit BTR EAX,17 ; Reset the 00020000h bit BTR ulDWord,EAX ; Reset the 2^(EAX MOD 32) bit BTR AX,17 ; Reset the 0002h bit CDQ Sign Extend EAX to EDX:EAX The CDQ instruction extends the sign bit of EAX into all 32 bits of EDX. CDQ is the 32-/64-bit form of CWD. CWDE Sign Extend AX to EAX The CWDE instruction extends the sign bit of AX into the upper 16 bits of EAX. CWDE is the 32-bit form of CBW. Instruction Summary Page 13 of 16 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved Jcc Jump Conditional The Jcc instructions have been extended on the 80386 to allow a segment-relative (NEAR) target address (previously the target was restricted to an 8-bit relative offset). All conditional jump instructions, except JCXZ have this capability. Programming Tip: A "quirk" in the Microsoft assembler (MASM) defaults all forward label references to NEAR if the instructions allows it. With the 80826, the Jcc instructions did not allow NEAR offsets, and SHORT offsets were generated. With the 80386, NEAR offsets are allowed and are the default, even for Jcc instructions. To override this default, specify the SHORT modifier on forward target references unless you particularly require a NEAR reference. Otherwise, the Jcc instruction will use 4 bytes for each Jcc in a 16-bit code segment (as opposed to 2) or 8 bytes in a 32-bit code segment (as opposed to 2). LFS, LGS, LSS Load Full Pointer The LFS, LGS and LSS instructions are similar to the LDS and LES instructions, except that they alter the FS, GS and SS registers. EXAMPLE: LSS SP,pStack ; Load a new stack Instruction Summary Page 14 of 16 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved MOVSX Sign Extend into Register The MOVSX instruction copies the byte or word from the effective address into the destination register, extending the sign of the byte or word into the register. If the destination is a 16-bit register, the effective address refers to an 8-bit value that is sign extended. If the destination is a 32-bit register, the effective address may refer to an 8- or 16-bit value. Programming Tip: CWDE may be used in place of MOVSX EAX,AX and CBW may be used in place of MOVSX AX,AL. MOVZX Zero Extend into Register The MOVZX instruction copies the byte or word from the effective address into the destination register, and zero- extends the remaining bits in the destination register. If the destination is a 16-bit register, the effective address refers to an 8-bit value; the upper 8 bits of the destination register are zeroed. If the destination is a 32-bit register, the effective address may refer to an 8- or 16-bit value; the upper 8 or 16 bits of the destination register are zeroed. Programming Tip: Instead of programming: MOV AL,BYTE PTR x XOR AH,AH use, instead: MOVZX AX,BYTE PTR x Instruction Summary Page 15 of 16 80386 CPU Enhancements Copyright 1989, Daniel A. Norton All Rights Reserved SETcc Set Byte on Condition The SETcc instructions store a byte at the specified destination according to the specified condition. If the condition is TRUE, a 1 is stored; if the condition is FALSE, a 0 is stored. The condition codes for SETcc are the same as those for the conditional jump instructions. SHLD Shift Left Double Precision The SHLD instruction shifts the specified 16- or 32-bit target operand to the left by the specified number of bits. Bits are shifted in on the right from the specified source register, which remains unaltered. EXAMPLES: SHLD usWord,AX,5 ; Shift "usWord" left 5 bits from AX SHLD EBX,ECX,CL ; Shift EBX left "CL" bits from ECX SHRD Shift Right Double Precision The SHRD instruction shifts the specified 16- or 32-bit target operand to the right by the specified number of bits. Bits are shifted in on the left from the specified source register, which remains unaltered. EXAMPLES: SHRD usWord,AX,5 ; Shift "usWord" right 5 bits from AX SHRD EBX,ECX,CL ; Shift EBX right "CL" bits from ECX Instruction Summary Page 16 of 16